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Abstract 

Parameter estimation in diffusion processes from discrete observations 
up to a first-hitting time is clearly of practical relevance, but does not seem 
to have been studied so far. In neuroscience, many models for the mem- 
brane potential evolution involve the presence of an upper threshold. Data 
are modeled as discretely observed diffusions which are killed when the 
threshold is reached. Statistical inference is often based on the misspeci- 
fied likelihood ignoring the presence of the threshold causing severe bias, 
e.g. the bias incurred in the drift parameters of the Ornstein-Uhlenbeck 
model for biological relevant parameters can be up to 25-100%. We cal- 
culate or approximate the likelihood function of the killed process. When 
estimating from a single trajectory, considerable bias may still be present, 
and the distribution of the estimates can be heavily skewed and with a 
huge variance. Parametric bootstrap is effective in correcting the bias. 
Standard asymptotic results do not apply, but consistency and asymp- 
totic normality may be recovered when multiple trajectories are observed, 
if the mean first-passage time through the threshold is finite. Numerical 
examples illustrate the results and an experimental data set of intracellu- 
lar recordings of the membrane potential of a motoneuron is analyzed. 

Keywords: sequential estimation, diffusion processes, first-passage times, 
stochastic neuron models, model misspecification, bootstrap 
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1 Introduction 



In many applications data from a process are sampled sequentially up to the 
first time it hits an upper boundary, where some macroscopic event occurs, 
which renders further observations impossible. A prominent example and the 
one that triggered the authors interest in the problem comes from neuroscience, 
where the membrane potential of a neuron is believed to be well described by 
a diffusion process between the times where the neuron fires. The neuron fires 
when the membrane potential reaches some upper threshold, and thus, discrete 
observations of the diffusion process are obtained as long as the process has 
not crossed the threshold. In this context, statistical inference is often based 
directly on the longitudinal sample, ignoring the p resence o f the t h reshold and 



that t h e sample size i s now a ran d om variable, see iHopfnerl (|2007l ); iJahn et al 



( 201 if) : lLanskv et ail (|2006l l201dh : IPicchini et ail (j2008|). Such model misspec- 
ification, together with the unavoidable shortn ess of the observed paths, ca uses 
severe bias in the estimates of drift parameters ( Bibbona et al. ( 2008 , 2010h and 
Section [3] below), providing a strong motivation to find a dedicated statistical 
method. Another example comes from biomedicine, where a patient in danger 
of suffering some attack or death, is regularly monitored, e.g. blood pressure or 
some substance in the blood, but only when it is between some critical limits, 
whereafter the subject enters a major crisis. The problem also occurs when- 
ever parameter esti mation follows som e sequential data acquisition, for exam- 



Bibbona and Rubba. 



pie a sequential test ( Whitehead! . 119861 ) or a clinical trial (IJung and Kiml . 12004 ; 



20111 ). Sequential estimat ion for i.i.d. samples is a well es- 



tablished field of statistics (jGhosh et all 119971) . but to the authors knowledge, 



for diffus i on pro cesses it has on l y been studied f rom continuous obser v ations 
( Novikovl Il972t ISgrensenL Il983t iflozahskil Il989t iLiptser and Shirvaevl l200ll) 
and mainly with the aim of finding optimal sampling plans. 
For a killed process, no more information can be obtained onc e the threshold i s 
reached, and thus, standard asymptotic results do not apply. In S0rensen ( 1983f ). 
the likelihood function for a randomly stopped diffusion process is calculated 
when the process is continuously observed and consistency and asymptotic nor- 
mality of the MLE ar e discussed for an increasing sequence of stopping times 
tending to infinity. In lBhatl ( 196lh . regular Markov chains with a finite number 
of states (and one absorbing) is analyzed and asymptotic results are derived 
when a large number o f trajectories is available. We will extend this result fur- 
ther. In iFerebed (Il983l) . an unbiased estimator is found for the drift parameter 
of a Wiener process with drift (WD) with unit variance scaling observed con- 
tinuously up to a stopping time. 

The rol e of t he threshold in neuron al modeling was first emphasized in lPaninski 
(|2006al lbh. In lGiraudo et al. ( 2011 ) the stochastic differential equation for a dif- 
fusion process constrained to remain below the threshold was derived. 
In this paper we will study maximum likelihood sequential estimation from a 
diffusion process discretely observed up to a first-hitting time. In Section [2] 
the likelihood function is introduced. It is intractable for practically all models 
except the WD, and we point at some approximations for evaluating it when 
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the sampling frequency is high. Examples are provided for some models which 
are of practical interest in many fields such as WD, the Ornstein-Uhlenbeck 
(OU) and the Square Root (SR) processes. Not surprisingly, when only one 
trajectory is used in the statistical analysis, estimates are still biased, widely 
variable and skewed. Bootstrap bias correction may sometimes improve the es- 
timates. To get better estimates one needs to enlarge the sample by considering 
many independent trajectories. Standard Cramer conditions for consistency and 
asymptotic normality when the number of trajectories used in the estimation 
goes to infinity are easily stated, but they are impractical to check. In Section 
3] we propose some easier equivalent conditions with a nice probabilistic inter- 
pretation in terms of an associated regenerative process. An asymptotic scheme 
is hence pointed out that allow to find good estimates based on the likelihood 
function. This solution is of practical relevance in neuroscience where numer- 
ous trajectories are usually available and estimating from the global sample 
resolves any issue. In Section [5] simulation studies are carried out to illustrate 
the theoretical results and in Section [5] an experimental data set of intracellular 
recordings of the membrane potential of a motoneuron during mechanical stim- 
ulation obtained from an isolated carapace-spinal cord preparation from adult 
turtles is analyzed. Finally, in Section [7] we summarize our findings and discuss 
some further directions not treated in the paper. An Appendix provides a few 
numerical details. 

2 The likelihood function 

Assume that the discrete time Markov process {A^j-j^ defined on a proba- 
bility space (CI, J 7 , P e ) is made of observations from a diffusion process X t at 
equidistant time points iA, i = 0, 1, . . . , n for some fixed A > 0. Thus, we write 
Xi = XiA ■ Assume that X t is the solution to the stochastic differential equation 

dX t ^ fi(X t ;9)dt + a(X t ;9)dW t ; X = x (1) 

where W is a Wiener process. The drift and diffusion functions fj,(-) and a(-) 
are assumed known up to the parameter 9, which belongs to the parameter 
space C R p . They are assumed to be smooth enough to ensure for every 9 
uniqueness in law of the solution. Let fg(y,A\x) denote the transition density 
for a given value of 9, i.e. the conditional density under Pg of X t +A given that 
Xt = x. Assume that the interval (I, r) C R, for some — oo < I < r < oo, is the 
smallest one such that 

Pf| {X t e(l,r)) =1 
t>o 

and thus the state space is E = (I, r). 

Introduce an absorbing threshold at level b with I < xq < b < r, and 
assume that the initial value X is fixed and equal to .t <E Ef, = (i, 6). To 
avoid heavy notations we assume the threshold level is constant, but there is 
no technical obstruction to threat time-varying thresholds too. Since we have 
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incomplete observations of a continuous process, we cannot in general ensure 
that the continuous process has not crossed the threshold between observations, 
even if all observed points are below the threshold. However, the statistical 
problem is interesting exactly for those applications where the crossing of the 
threshold causes some macroscopic event that is always observed, and such that 
no further observations can be obtained. We assume that we are in this setting, 
thus the interval where the first-hitting time occurs is always observed. 
Define the stopping times 

T b = inf{i > : X t > b} 
N = inf{n £ N : nA > T b }. 

Throughout the paper, we assume that P(T b < oc | Xq = xq) = P(N < 
oo | Xq = xq) = 1. Let X£ be the process that coincides with X t as long as 
X t < b and is killed the first time X t crosses the threshold, whereafter it goes 
into a coffin state C which is artificially added to the state space, E b U C . Let 
Xf be its discretization, i.e., 



X 



X t for i < N 
C for i > N. 



The time homogeneous process X t which is at a point x < b at time 0, will at 
a later time A have crossed the threshold with some probability. We denote 

P 8 (T b < A\X = x) = G b e (A\x) = / g b (r\x) dr (2) 

Jo 

where g b (r\x) is the density of T b given that Xq = x. It may stay below the 
threshold at all times before A, and at time A be in state Xa = V < b with 
probability "density" 

/ e b (y, A|x) = ^-P e (X A < Ul T b > A\X = x) . (3) 

OU u=y 

The word probability density is slightly improper for fg(y, A\x) since 

f b e (y,A\x)dy = l-G b e (A\x). 

E b 

The killed process X\ allows for the following properly defined transition density 
f${y,A\x) = f b B { yi A\x) ■ t {x , yeEb} + G b e (A\x) ■ t {xeEbty=C } + H*, y =C} (4) 

w.r.t. the measure A © Sq defined on E b U C , where A is the Lebesgue measure 
on R and Sc is the Dirac measure in C. Here 1a denotes the indicator function 
of the set A. 

Observing the original discretized process Xi sequentially up to the stop- 
ping time N — 1 is equivalent to observe the killed process Xf infinitely, since 
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from the first visit to C no more information is gained. In both cases we 
can interpret each observed trajectory as a realization of a single random vari- 
able. Indeed, an observation (xi, . . . , x n -\) may be seen as a sequential sample 
X(w) = (x\, . . . , xjv-i) with random length N(uj) — 1 = n — 1 from the pro- 
cess Xi, but also as a single realization of a random variable whose values are 
(n — l)-tuples of elements of the state space Eb for some n — 1, free to vary from 
one to any natural number. Accordingly, wc have 

oo 

X:£l—>[J(E b y. (5) 

j=i 

Alternatively, the same trajectory can be interpreted as an infinite set of ob- 
servations X k (uj) = (x\, . . . , .t„_i, C, . . .) from the killed process X k where C 
is always observed from position n and onwards. In this sense it is a single 
realization of a random variable 

X k : n — (£ 6 UC)°°. (6) 

Whichever interpretation is preferred, the likelihood of X(u)) = (x\, . . . ,xjv_i) 
is given by 

oo oo tl— 1 

L(X; 0)=H fS(xi, A|xi_i) = II fe{xiA\xi-i) ■ ^(A|x n _i) • 1 {JV= „ } 

i=X n—1 i=X 

(7) 

for xo € Eb and (xi, . . . ,x„) £ (Eb U C)". However, in the sense of (O it is a 
density w.r.t. the dominating measure ©^A* where A z are Lebesgue measures 
on IR\ while in the sense of © it is a density w.r.t. the infinite power (X(B5c)°°- 
The functions fg(y, A|x) and Gg(A\x) are not generally known in explicit 
form, except for Brownian motion killed at a constant threshold and a few 
other cases. Nevertheless, when high frequency data are available, i.e. when 
A is small, some methods that were introduced for different purposes may be 
applied to get a reliable approximation, which is computationally simple enough 
to be implemented within an optimization algorithm. That is the subject of the 
following two Subsections. 

2.1 Transition density for small sampling intervals 

The transition density and fg(y, A|x) are related by the following equation 

f b g (y, A|x) = fg(y, A|x) • (1 - P(T b < A |x, yj) (8) 

where P(T b < A \x,y) denotes the probability that the process X condi- 
tioned upon being in x at some time t and in y at t + A crosses the thresh- 
old fo r the first time between those two points. In IBaldi and Caramellinol 
( 2002 ); Giraudo and Sacerdote ( 1999() two computationally efficient methods 



were proposed and compared to approximate this probability when A is small. 
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The main goal was to design a simulation scheme for killed processes (Ap- 
p endix IA.1I). Both meth o ds ca n be applied to our problem. We choose that 



of iBaldi and Caramellinol (|2002j ) since it leads to a faster numerical evaluation. 
This is particularly important since we need to locate the maximum of the like- 
lihood function by means of a numerical optimization algorithm that requires 
multiple evaluations. Assume the drift and diffusion functions in ([T]) to be 
fi G C 1 (int(£ , b )), cr G C 2 (mt(£: fc )) and a(z) > for z £ mt(E b ). For x,y G E b 
and small A we apply the following approximation 



P(T b < A \x,y) «exp 



2 
A 



dz 



dz 



a{z) J a(z) 



(1 + Acj) b ) 



where 



and 



( 1 / rv Hz)dz 




b \{z)dz | fb \(z)dz 



rb X{z)dz rb 
Jx a(z) ~^Jy 



rb dz i fb dz 
(z) Jo: ■oTZJ+Jy STJj 

rb \{z)dz 

rb dj~ 

Jy trfz) 



\<?(y) 



if y + x 

if y = x 



where we assume A(-) locally Lipschitz continuous on int(E b ). 



2.2 Conditional distribution of the first-hitting time for 
small sampling intervals 



The density g b {r\x) satisfies the following integral equation (jBuonocore et al 

Hast, 



g b {r\x) = -2^ b {r\x)+2 g b (r\x) * 6 (r - r|6) dr, (9) 
Jo 

where the function 4 f b(r|x) is defined as follows, 
^b(r\x) = ^(X r < b I X = x) + i - la' (6) J / e (X r = 6 1 X = z). 

A crude approximation for small r that proves surprisingly efficient in practice 
is to neglect the integral in © and approximate 

rA rA 

G b e (A\x)= g b {r\x)dr^-2 V b (r\x)dr 
Jo Jo 

1 ^ ' A 



= 2P(X r >b\X = x)-\u(b)--a'(b)j J f e (X r = b\X = x)dr 

(10) 

allowing for reasonably fast evaluations in the optimi zation algorithm (Appendix 
A. 3D. More precise approximations may be achieved ( Sacerdote and Tomassettil 
19961 ). but they require much heavier computational efforts. 
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3 Examples 



Here wc present in details a few examples showing that the general theory can 
effectively be applied for processes which are of practical relevance in many 
fields. For each example a simulation study is performed to evaluate the quality 
of the estimators. We generate 10,000 samples of one trajectory, stopped at 
the first crossing time of a threshold. Details on how to properly detect the 
first-passage times of discretely simulated paths are given in Appendix IA.11 
For each trajectory estimates are computed by numerical maximization of the 
log-likelihood function. A few numerical details are provided in Appendix fA] 



3.1 Wiener with positive drift 

The WD is one of the few cases where ([2]) and (j3|) are explicitly known for any 
value of A (the high frequency data assumption is not needed in this case). 
Consider X t given by 

dX t = fidt + adW t (11) 
with [i and a positive and Xq = xq < b. We have ( Sacerdote and Giraudol 



2012) 



f b g (y,A\x) = f 9 (y,A\x) 



and 



G» e (A\x)=-{ 1-erf 



b — x — fiA 

\/2Acr 



1 — cxp 



2(b-y)(b-x) 



exp 



2fx(b-x) 



2 A 



1 - erf 



(12) 



b- 



12A0 



where erf(-) is the error function. 

The initial point is fixed at xq = 0, the threshold is at b = 10 and the 
simulation step is A = 1. Results are presented in Table [TJ The drift parameter 
/i is overestimated up to 200% with increasing bias when adding more noise. 
The distribution of fi is skewed with a long right tail. 



3.2 Ornstein-Uhlenbeck process 

The OU process is used as a model for many phenomena in physics, biology, 
engineering and finance. In particular, it is used in neuroscience as the most 
tractable version of the Leaky Intcgrate-and-Fire models to represent the evo- 
lution of the membrane potential of a neuron between two spikes. A spike is 
generated whenever the process reaches a firing threshold and then the mem- 
brane potential is reset immediately to a fixed value xq (the resetting potential). 
Then the evolution starts anew with the same law independently of the past. 
The OU process is solution to the stochastic differential equation 

dX t = {-pX t +n)dt + adW t (13) 
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Tabic 1: WD model. Results of the simulation study. Value used are Xq 
b = 10 and A = 1. 





par 


true 


avg(<?) 


av g (e)-e 


(2.5%, 97.5%) 


avg(AT) 




s 


CASE 1 


A 1 
a 


0.3 
0.5 


0.326 
0.488 


0.086 
-0.024 


(0.180, 0.541) 
(0.365, 0.612) 


33.72 


CASE 2 


A 1 

<7 


0.3 
1.5 


0.520 
1.445 


0.734 
-0.037 


(0.088, 1.631) 
(0.932, 1.912) 


34.28 


CASE 3 


A* 

(7 


0.1 
0.5 


0.125 
0.496 


0.247 
-0.009 


(0.045, 0.274) 
(0.419, 0.576) 


100.73 


CASE 4 


A* 
(J 


0.1 
1.5 


0.320 
1.467 


2.203 
-0.022 


(0.019, 1.281) 
(1.048, 1.844) 


101.61 



with B and a positive. It is gaussian with conditional mean and variance 
E(X 4 |X = so) = xoe-P* + £ (1 - e-^) , Var(X 4 |X - x ) = ^ (l - e" 2 ' 34 



■16 



(14) 



Thus, E(Aroo) = p,/0. A threshold is imposed at level b > xq = 0. The mean 
first-passage ti me through th e threshold is finite and can be calculated according 
to formulas in ISieeertl (l95ll ): iRicciardi and Satol (|l988l ). 

Parame ter estimation for this mo d el from neuronal data was performed for 
example in lLanskv et all (|2006l . l2010h ; iPicchini et al. (2008) by maximizing the 
product of the gaussian transition densities of the underlying unconstrained OU 
process. This we will call naive maximum likelihood, and it neglects the role o f 
the threshold, and thus, the likelihood is misspecified. In iBibbona et all (|2008l ) 
it was shown that considerable bias may occur by estimating in this way. 

The likelihood function ([7]) can be approximated as in Sections 12.11 and 12.21 
We have 



P(T b <A\x,y)nexp - 



2(b -x)(b-y) 
a 2 A 



(1 + A&) 



(15) 



with 



0{b- x)(b - y){f3b + fix + fiy - 3/a) 



3a 2 (2b-x-y) 

to be substituted in formula (|8|), and Gg(A\x) may be d erived evaluating formul a 
(jTDJ) . The function ^(rla;) was calculated explicitly in lBuonocore et al.1 (jl987l ). 
but its numerical integration takes much longer time than the evaluation of 
the normal cumulative distribution function in the first term and numerical 
integration in the second summand of the right hand side of (|10[) . 

In Table [2] we present the results of a simulation study with 4 different pa- 
rameter settings, in which we evaluate the performance of the MLE for (/3, /j,, a) 
for the fully specified model and compare it with the naive estimation. 



Table 2: OU model. Results of the simulation study. 9 indicates either /x, j3 or 
cr. Values used are xp = and b = 10. 





true values 


method 


par 


avg(6>) 


avg(0)-e 


sd(0) 


(2.5%, 97.5%) other values 




f) 








/' 




ft 7^9 




(ft 9(17 9 1 fil "i 






ijiivemiooQ 


a 
P 


U.UOl 


ft fi9^ 


n n77 


(ft ftftft ft 97f\\ ¥(Y \ 8 fi 

(^U.UUU, U.Z/Oj LH^AqqJ — O.O 


CASE 1 


/3 = 0.05 




(7 


1.198 


-0.001 


0.053 


(1.091,1.303) A = 0.1 


cr = 1.2 




[I 


0.779 


0.813 


0.532 


(0.207, 2.254) E(N) =417.8 






naive 


P 


0.091 


0.815 


0.081 


(0.000, 0.294) 








a 


1.197 


-0.003 


0.053 


(1.087, 1.301) 










1.339 


0.339 


0.600 


(0.584, 2.822) 




M = l 


Likelihood 


P 


0.063 


1.550 


0.082 


(0.000, 0.277) E(X oo )=40 


CASE 2 


P = 0.025 




a 


0.993 


-0.007 


0.070 


(0.855,1.135) A = 0.1 


cr = 1 






1.373 


0.373 


0.630 


(0.581, 2.915) E(N) = 114.1 






naive 


P 


0.077 


2.062 


0.089 


(0.000, 0.300) 








a 


0.990 


-0.010 


0.071 


(0.851, 1.133) 










2.638 


0.319 


1.370 


(0.931, 6.150) 




/i = 2 


Likelihood 


P 


0.282 


0.173 


0.218 


(0.000, 0.790) E(X«,)=8.33 


CASE 3 


/3 = 0.2 




a 


1.686 


-0.008 


0.126 


(1.428, 1.933) A = 0.1 


cr = 1.7 




M 


2.783 


0.392 


1.445 


(0.967, 6.457) E(N) = 126.8 






naive 


P 


0.322 


0.344 


0.229 


(0.000, 0.869) 








a 


1.679 


-0.012 


0.127 


(1.420, 1.929) 










8.151 


0.019 


1.315 


(5.775, 10.934) 




H = 8 


Likelihood 


P 


1.003 


0.003 


0.177 


(0.657, 1.361) E(Aoo)=8 


CASE 4 


= 1 




a 


1.011 


0.011 


0.108 


(0.814, 1.253) A = 0.49 


a = 1 






8.326 


0.041 


1.322 


(5.944, 11.156) E{N) = 121.6 






naive 


P 


1.033 


0.033 


0.175 


(0.698, 1.393) 








a 


0.984 


-0.016 


0.114 


(0.734 1.205) 



When the asymptotic level E(X oa ) = n/fi is above b is denoted supra- 
threshold regime, while sub-threshold regime when it is below. In Case 1 pa- 
rameters are chosen in sub-threshold regime with values compatible with those 
expected for t h e mem brane potential of a neuron during spontaneous activity 
( Lanskv et al. (2006), values are expressed in units of milliseco nds and milli 
volts) . In Case 2 they are chosen according to the ones estimated in lLanskv et al 



(2010) during stimulation, yielding supra-threshold dynamics. Cases 3 and 4 
are illustrative of different ranges. In case 4 a larger simulation step is used to 
test if the approximation is still acceptable. 

The two methods provide similar estimates, but the naive method performs 
slightly worse. The superiority of the MLE will become apparent in Section I5TT1 
Table [2] shows that p, displays a skewed distribution with a heavy right tail, and 
that (3 is relatively more variable compared to the ot her two parameters. Esti- 
mates of a are good whatever method is applied. In iBibbona et al.l (|2008l ) the 
bias incurred in /t from naive estimation of fi and a when (3 is assumed known, 
was evaluated approximately to be a 1 jb. This value largely underestimates the 
true magnitude of the bias when /3 also has to be estimated. 



3.3 Square root process 

The SR process is the solution to the following stochastic differential equation 



dX t = {-0X t + /i) dt + ay/XtdWt 



(16) 



with j3 and a positive. We assume 2/i > <r 2 , in which case the boundary is 
entrance, following Feller ' s clas sification. The SR process was first studied in 
pure mathematics ([Fellerl . 119511 ) as an example of a singular diffusion process, 
and it gained pop ularity under the n ame of the CIR process in finance as a model 



for interest rates ( Cox et al. . 19851) . In neuroscience it has been proposed as a 



Leaky Integ rate- and-Fire model for the membrane potential evolution between 
two spikes (jGiorno et all 119881 ). It is considered more realistic than the OU 



process since it is bounded from below. The transition density is a non-central 
chi-square distribution with non centrality parameter 



A 



-pt 



a 2 (l 



-pt) 



and k — 4:fj,/a 2 degrees of freedom. The mean first-passage ti me through a 



consta nt boundary b is finite; an explicit expression is given in lLanskv et al 
(Il995l) . 



The likelihood function (O can be approximated as in Sections 12.11 and 12.2 
In particular, 



P(T b < A|z,y)«exp 



,(Vb-^)(Vb-Vyy 



(1 + A&) 



(17) 
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Table 3: SR model. Results of the simulation study. 6 indicates either /i. /3 or 



a. 





par 


true 


avg(0) 


avg(6)-e 

e 


sd(0) 


(2.5%, 97.5%) 


Xq 


b E(X oc ) 


A 


E(A0 




M 


10 


11.874 


0.187 


7.363 


(2.240, 29.494) 










CASE 1 




1.2 


1.331 


0.109 


0.999 


(0.000, 3.571) 


5 


10 8.3 


0.08 


51.4 




a 


0.7 


0.686 


-0.019 


0.093 


(0.493, 0.875) 














6 


7.154 


0.192 


4.416 


(1.607, 17.945) 










CASE 2 




0.31 


0.351 


0.131 


0.290 


(0.000, 1.021) 


10 


20 19.4 


0.12 


62.8 




o 


0.5 


0.490 


-0.018 


0.052 


(0.386, 0.595) 














2 


4.054 


1.027 


3.356 


(0.863, 13.198) 










CASE 3 




0.05 


0.163 


2.257 


0.214 


(0.000, 0.730) 


10 


20 40 


0.08 


96.4 




(j 


0.5 


0.495 


-0.010 


0.041 


(0.412, 0.576) 











with 



<t>b = -{[Vx~(b-y) (vP 2 b - 3<r 2 /x + —a 4 + 3 M 



y/y(b-x) [x/3 2 b - 3ct 2 m + — o 4 + Sfi 2 

Vb(x-y) (xyp* - 3a 2 M + ^<r 4 + 3^ ] } 

3 a 2 ^/~x~yb {-Jx - sjy) (-2Vb - +y/x + y/y) 



to be substituted in formula (jHJ, and Gg(A\x) may be deriv ed evaluating f ormul a 
(|10p . The function ^ r h(r , |a;) was calculated explicitly in iGiorno et all (|l989l ). 
As in the OU case, its numerical integration takes much longer time than the 
evaluation of the right hand side of (fT0|) using the cumulative non-central chi- 
square distribution function in the first term and numerical integration in the 
second. 

In Table [3] we present the results of a simulation study in three different 
parameter settings. A short account on a few numerical difficulties is given 
in Appendix IA.3I The choice of parameters is representative of the two main 
regimes: sub-threshold (Cases 1 and 2) and supra-threshold (Case 3). Results 
are similar to those in the previous examples. 



3.4 Bootstrap bias correction 

A common feature emerging from the examples is that MLE for killed processes 
may be heavily biased, largely variable and skewed. In some cases the bias is an 
effect of the sequential sampling: MLE is unbiased for the same model observed 
with a fixed length, even if short. It is the case of the WD, but also of the 
OU process wh e n the parameter (3 is known and only [i and a are estimated 



(jBibbona et all 120081 ) and of some discrete models such as multidimensional 
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random walks (jBibbona and Rubbal . 120111 ). The same "optional sampling ef- 



fect" was noti ced in MLE fo l lowin g a sequential test ([Whitehead! . I1986I) . in 



clinical trials (IJungand Kiml . 120041) and is known for binomial samples since 



Girshick et al 



( 19461) . For the OU and SR processes, on the contrary, even with 
a standard sampling scheme where trajectories are recorded up to a fixed but 
small time, bias, large variance and skewn ess would be eq u ally p resent. This 
phenomenon was illustrated for example in iTang and Chenl ( 20091 ) where para- 
metric bootstrap was shown to be effective in reducing the bias of the estimates 
for diffusion processes observed up to a short fixed time without increasing the 
variance too much. We will test the same method for our sequential problem. 
First we generate our simulated sample with parameter 6 (3,000 trajectories). 
For each path we first calculate the MLE 8 and then we generate a bootstrap 
sample simulating 1,000 trajectories with the estimated parameter. On the boot- 
strap sample we estimate again path by path and calculate the average estimated 
bootstrap parameter avg(f?s). The bias at 9 is evaluated as bias(0) = avg(f?s)— # 
and the estimate 8 is corrected to 9 BC = 9 — bias(6>) = 29 — avg(6>s) assuming 
that the bias at 9o and that at 9 are not much different. Results arc illustrated in 
Table |4] The relative efficiency is defined as the ratio between the determinants 
of the sample mean square error matrices of the two estimators. It is geomet- 
rically interpreted as the squared ratio between the area of the co ncentration 
ellipsoids of the random vectors 9 BC — 9q and 9 — 9q ( Cramer (1946)). When it 
is smaller then one, the bootstrap corrected estimators are more concentrated 
than their original counterparts. This method provides a better comparison be- 
tween two families of joint estimators w.r.t. the individual mean squared errors 
since it takes into account not only the marginal distribution of the estimates, 
but also their joint behavior. Our simulations confirm that parametric boot- 
strap is very effective in removing the bias but the increase in variability is not 
always negligible. 



4 Consistency and asymptotic normality 

For processes killed at a threshold standard asymptotics do not apply. When 
the process is killed no further information can be obtained, and the asymp- 
totic scenario of number of observations going to infinity (for fixed sampling 
interval) is no longer relevant. Moreover, on a fixed interval [0,nA], increasing 
n by decreasing A does not improve estimators of drift parameters, which are 
inconsistent even for the fully continuously observed trajectory. Asymptotic 
properties of MLEs may nevertheless be exploited by considering a collection 
of independent realizations of the killed process and drawing inference from the 
global log-likelihood function of the entire sample. This kind of asymptotics 
has alre ady been intr oduced for regular Markov chains with a finite number of 



states in iBhati (119611 ) 



Assume a sample of m independent discretely observed trajectories of X. 
Since every trajectory may be considered as a single independent random vari- 
able according to (|5|) (the choice in the following) or © , the global log-likelihood 
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Table 4: Bootstrap. Results of the simulation study. 9 indicates either fi, f3 or 
g. 





par 


true 


avg(0J 


avg(W J 


hdyu ) 


SCl^t/ J 


rcl.cff. 






0.43 


0.745 


0.431 


0.524 


0.549 




OU CASE 1 


P 


0.05 


0.080 


0.051 


0.078 


0.095 


0.927 




a 


1.2 


1.197 


1.201 


0.052 


0.052 








1 


1.324 


1.016 


0.596 


0.644 




OU CASE 2 




0.025 


0.062 


0.030 


0.081 


0.100 


1.202 




a 


1 


0.992 


1.001 


0.071 


0.072 






11 


10 


12.014 


9.887 


7.774 


8.584 




SR CASE 1 




1.2 


1.352 


1.174 


1.044 


1.181 


0.990 




a 


0.7 


0.690 


0.707 


0.093 


0.093 








2 


4.112 


2.341 


3.372 


3.946 




SR CASE 3 


P 


0.05 


0.167 


0.075 


0.215 


0.264 


1.025 




a 


0.5 


0.495 


0.500 


0.041 


0.041 





function can be written as 

m 

l m (X^,--- ,X^;9) =]TlogL(A«;#) 

where L(X^;9) is the likelihood © of a single trajectory. A function h : 
E x — > R is called locally dominated integrable w.r.t. a measure /i if for each 
6' E 9 there exists a neighborhood Ugi of 9' and a non-negative /x-integrable 
function kgi : E — > R such that \h(x, 9)\ < kg' (x) for all x £ E and all 9 £ Ug>. 



Standard th e orems for i.i.d. samples ([Cramerl ()19461 ) or the more modern 



SorenserJ ( 19991 ) ; ISorensenl ( 20 111 ) ) guarantee that a consistent root of the likeli 



hood equation exists which is asymptotically normal and efficient provided the 
following condition is fulfilled. 

Condition 4.1 

1. The function L(X;9) is twice continuously differentiable w.r.t. 9 for all 

XeU? =1 (E b y. 

2. For every fixed X £ UjliC^fO^ > the functions dg i L(X;9), de i e j L(X;9), 
i,j £ {1, . . . ,p}, are locally dominated integrable w.r.t. the measure ©"^A 1 . 
Moreover, the functions dg i g j logL(X;9) are locally dominated integrable 
w.r.t. the measure induced by L(X;9q). 

3. The information matrix with elements mij = Eg [9^ log L(X; 9) dg i log L(X; 0)] 
is finite and positive definite, where Eg (-) denotes expectation w.r.t. Pg . 

Conditions 14 . 1 1 (fT) and 14. II ([2]) arc easily transferred to the transition density 
||1J}, while Condition IQ1 is unmanageable due to the complicated expression 
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of the likelihood ([7]). We will show that it is equivalent to a simpler one that 
only involves the transitions (|4|). 

Under suitable regularity conditions (interchange of integration and differ- 
entiation), for all i £ {1, ■ • ■ ,p} and for any 3; 6 £j, we have 

Eg[d e , \ogf£(X t+A ,A\X t =x)] 

d e Aogf b (y 7 A\x)f b (y,A\x)dy + d e AogG b (A\x)G b (A\x) = 0, (18) 

E b 

and Eg [dg\ogL(X; 9)] = (the null vector) as always happens with regular 
score functions. 



The elements 



of the information matrix can be calculated as follows: 



--E e [d e \ogL{x ] e)d e \ogL{x-e)} 

5^fy 4 log/£(z ,A|a: _i) 



E 

new 

E 

00 



.1=1 
n-1 



n-1 



^dgXogfUxh.A^h^) 
h=l 

E d h lo sfe( x a, ^\xa-i)de } logfg(x a , A|x _i) 



(IP 



O.n 



de\ogG b e {A\x n ^)d e \ogG b e {A\x 7 , 



de, log fe (x a ,A\x a -i) d 9j log fg{x a , A\x a - 



dP 8 ,n( X ) 



x /g(.T a _i, (a - l)A\x Q )fg(x a , A|.T a _i) <fa a dcc a _i 

00 „ 

+ E / <9 ei logG^(A| a;il _ 1 )a e JogG^(A| a;n _ 1 )/ e b (x„_ 1 ,(n-l)A|.To) 

x G^AI^-i)^-! + fy>gG$(A|x )fy>gG$(A|a;o)G$(A|a:o). 

where dP s , n {X) = f^(xi,A\x ) ■ ■ ■ A|a;„_ 2 ) G^(A|x„_i) etei • • • cfe„_i 

and 9 is evaluated at £>o- The second equality holds by definition. The third 
equality follows from (|18j) when a and h arc different. The fourth equality is 
obtained inverting the order of the summations and performing integration over 
all variables except for x a and x a -i using the following Chapman-Kolmogorov 
relations 



f b { Vl t\x)f b (x, s\z)dx = f b (y, t + s\z) 



G b (t\x)f b (x, s\z)dx = G b (t + s\z) - G b (s\z), 

and that since the crossing of the threshold is a sure event, we have 

00 

J2 [G b e ((n - a)A\x a ) - G b ((n -a- l)A\x a )] = 1. 

n— a+1 
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We rename the variables x a -i and x a in the right hand side of the last equality 
of rriij by x and y, using that the process is time homogenous. We therefore 
have 

( . oo 

dg\ogf b {y, A\x)dg\ogf b g{y, A\x)f b e (y, A|x)V/ e b (x, nA\x Q )dydx 



d 8l \ogG b e {A\x)d Bj lo g G b g (A\x)G b e (A\x)Y,fe^,nA\x )dx 
deAogG b (A\x )d e] \ogG b (A\xo)G b (A\xo). (19) 



For the matrix elements rriij to be finite we need Y^=ofe \ x i n A\x$) to con- 
verge to a function v{x). Then J E v(x)dx = Yl™=o Ie fe {x,nA\xo)dx = 
^o p ( N > Thus > ^e (N)" = En=o p ( N > nA)°= j Eb v{x)dx + 

y^ n _ 1 P{N = n). Since the passage is a sure event, the normalized function 

Mx) = Mao 

is a probability density on Ejj U C, and the function 

Q£(x,y)=ire(x)f$(y,A\x) 

is a joint probability density on (E^UC) 2 . Finiteness of Eg (N) and convergence 
of the above series are equivalent. Thus, if Eg (N) < oo (and so Eg (Ti,) < oo) 
we can recast formula (|19j) into the following form 

d 6t \ogfe(y, A\x) c> .log f$(y, A\x) Q%(x, y) dydx. (20) 

(E b UC) 2 

Finiteness of rriij is now restated as the square integrability of dglogfg (y, A\x) 
evaluated at 0q w.r.t. the measure with density Q^ o (x 7 y). 

We have proved that the following condition is equivalent to Condition 14. II 

Condition 4.2 

1. fg(x,y) is twice continuously differentiable w.r.t. 8 for all (x,y) G (Ej, U 
C) 2 . " 

2. For every fixed y £ E^, the functions dg^^x^y), dg^^^x^y), i,j € 
{1, ...,p}, are locally dominated integrable w.r.t. the measure A + 5c- 
Moreover, the functions dg i g j Tog fg (x, y) are locally dominated integrable 
w.r.t. Qf Q (x,y). 

3. Eg {T b ) < oo. 

4- The information matrix with elements (|20p is finite and positive definite. 
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The probabilistic meaning of and Condition 14.21 is as follows. 



Remark 4.1 The probability density TTg (x) can be interpreted as the density 
of the stationary measure of a regenerative process which coincides with X.\ up 
to the first time N it is in C , at the next step it is reset to Xq with prob ability 
one, and then it starts anew with the same law i Mevn and Tweedit , 200& , Thm. 
10.0.1). Indeed, Condition ^. 2\ ^) ensures that such a process is stationary and 
that we (x) is well defined. Q^ o (x,y) coincides with the joint stationary density 
of two consecutive observations from the regenerative process whenever x G Eb. 
Asymptotic properties for many trajectories could have been derived directly for 
a sample of such a stationary re generative process which is observ ed for a long 
time under the same hypotheses iSdrensen . \1999\ : \S0rens e^. \20li ). 



For diffusion processes the transition densities are often not known in explicit 
form. Parameter estimation could then be based on some martingale estimating 
functions which play the role of the score vector. If we can find a martingale 
estimating function for the killed process Xf , a straightforward modification of 
Condition 14.21 still guarantees that a consistent root of the estimating equation 
exists, which is asymptotically normal when the number of trajectories becomes 
large. 



5 Examples with multiple trajectories 

Based on the asymptotic results in the previous Section, one might be encour- 
aged to use MLE to estimate from a sample of multiple trajectories. However, 
the number of trajectories is always finite, and when sampling from a diffusion 
process, the sampling interval cannot be as small as we like. To assess the va- 
lidity of the approximations introduced for non-trivial diffusions, we return to 
simulations. The simulated samples from Section [2] are used again, but now the 
samples are divided in groups of m trajectories. We estimate from each group 
by maximizing its global likelihood. Different values of m arc used (in particular 
m = 1, 3, 10, 30, 100) to show which is the number of trajectories needed to get 
reliable results. We only present the most relevant cases. In particular, WD is 
not illustrated since all effects are visible in the more interesting models. 

5.1 OU model 

In the upper panels of Figure[TJ density plots of the relative estimates (9 m — ff)/6 
are displayed, where 9 rn is the estimated value from a group of m trajectories. 
The true parameters are those of Case 1 in Table [5J When estimating from 
single trajectories, the probability of getting an estimate which differs from the 
true value by more than 100% is high. Increasing m the distribution of the 
estimates becomes more symmetric and concentrated around the true value. 
Even for m = 3 or 10, the quality of the estimator is considerably increased, 
and when m = 30 or 100, parameters are well estimated. 
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Density plots of the estimates 




P 





1 2 3 -2 2 4 -0.1 -0.05 0.05 0.1 

Normal Q-Q plots of the estimates from single trajectories 




-4 -2 2 4 -4 -2 2 4 -4 -2 2 4 

Normal Q-Q Plot of the estimates from groups of 100 trajectories 




-1 o 



-1 1 



-2-10 1 2 



Figure 1: OU model, Case 1. Upper panels: density plots of the relative esti- 
mates (8 m —8)/8, where 8 m is the estimated value from a group of m trajectories, 
for in = 1,3, 10, 30 and 100 (from less to more peaked). Lower panels: normal 
Q-Q plots of the estimates, from single trajectories (middle) and from groups 
of 100 trajectories (lower panels). 
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Likelihood naive 
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Figure 2: OU model, Case 1. Confidence intervals for the mean relative bias 
(6 — 6)/ '9 of MLE and naive estimates by groups of trajectories, against to, the 
number of trajectories used in each group. 



In the lower panels of Figure [TJ normal Q-Q plots of the estimates are dis- 
played, first from single trajectories and then from groups of 100 trajectories. 
While we are far from normality in the first case, an approximately normal 
distribution is achieved for to = 100. 

Figure [2] illustrates confidence intervals for the mean relative bias for the 
drift parameters of the MLE and of the naive method. They are calculated 
according to the following formula 

avg(fl m ) - £(o.975,fc-i)sd(6> m ) 

where to is the number of trajectories per group, and the integer k ~ 10, 000/m 
is the number of repetitions. Here, i(o.975.fc-i) is the 97.5% quantile of the t- 
distribution with k — 1 degrees of freedom. This approximation formula holds 
when to = 100 since we have approximately normal estimates and k = 100 
groups, while when to = 1 the estimates arc far from normal, but the sample is 
now large (k = 10,000). 

If the true likelihood is used, the large bias in the estimated drift parameters 
from single trajectories already disappears for to = 30. The amplitude of the 
confidence interval does not change much when we use large numbers of trajec- 
tories, since the reduction of the variance of 9 m is compensated by the increase 
in the factor 1 / y/k. 
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p 



1 1 



1 3 10 30 100 



1 3 10 30 100 



1 3 10 30 100 



Figure 3: OU model. Case 4. Confidence intervals of the mean relative bias 
(0 — 9) / 6 estimated from groups of trajectories with a non- infinitesimal sampling 
step (A = 0.49) against m, the number of trajectories used in the estimation. 



Increasing m it becomes apparent that the naive estimators for the drift 
parameters settle to some constant level, which does not always coincide with 
the true values: (3 differs from (3 by 12% in the case plotted. 

The quality of the estimator is sensitive to the approximations introduced. 
Consider Case 4 in Table [5J where the step of the discretization is A = 0.49, 
and thus not small. Density plots and normal Q-Q plots are similar to those 
illustrated for Case 1 (not shown). Confidence intervals of the mean relative 
bias are plotted in Figure [3J Parameters are chosen such that the mean of the 
sample size N in each trajectory is comparable to the other cases, and thus, the 
observation intervals (NA) are longer. The estimates of the drift parameters 
from single trajectories are consequently better. When estimating from many 
trajectories, however, some very small asymptotic bias is now visible even when 
estimating from our best approximation of the likelihood. Nevertheless, the 
method is robust and the bias is less than 1%. 

To conclude, 30 trajectories are enough to get approximately unbiased es- 
timates, and for m = 100, the estimator is also approximately normally dis- 
tributed. 

5.2 Square root model 

Despite minor numerical complications (Appendix IA.3|) . the SR model behaves 
essentially like the OU model, cf. Figure @] The main difference is that the 
estimator of a has larger variance. Again 30 trajectories are enough for unbi- 
ascdness and normality is almost achieved with m = 100. 

6 Intracellular recordings from a motoneuron 

The membrane potential from a spinal motoneuron in segment D10 of an adult 
red-cared turtle ( Trachemys scripta elegans) was recorded while a periodic me- 
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Density plots of the estimates 




chanical stimulus was applied to selected regions of the carapace with a sampling 
step of 0.1 ms (for details see lBerg et al.l (|2007l 120081) ). The turtle responds to 
the stimulus with a reflex movement of a limb known as the scratch reflex, 
causing an intense synaptic input to the recorded neuron. Due to the time 
varying stimulus, a model for t he complete data set needs to incorporate the 



time-inhomogcneity, as done in iJahn et al.l (|201lr ). The data can only be as 



sumcd stationary during short time windows, which is required for the models 
introduced in Sections [3.21 and EOl called Leaky Integrate-an-Fire (LIF) models 
in computational neuroscience. Here, the crossing of the threshold corresponds 
to a firing of the neuron (a spike), and a trajectory corresponds to an interspike 
interval. In the LIF models the intensity of the synaptic input is given by the 
parameters /i and a, and under stimulation they are both higher, causing the 
mean membrane potential to increase. Also /3, the inverse of the time constant, 
depends on the conductance and is state dependent. Thus, the parameters 
are influenced by the intensity of the input and cannot be considered constant 
along the experiment since the stimulation is varying. During intense stimula- 
tion, however, the neuron fires frequently and the typical length of an interspike 
interval can be assumed smaller than the time scale of the variability of the 
input. Therefore, we consider the input approximat ely constant for at least 3-4 
consecutive trajectories during on-cycles (following I Jahn et al.l ( 201ll )). From 
non-parametric estimation it was shown that after a translation of the data 
which set the zero level of the potential (the inhibitory reversal potential) to 
approximativcly —74.5 mV, a LIF model based on a SR process des cribes the 
data in the on-cycles better than the OU model ( Jahn et al. ( 201lh ). Spikes 
are easily identified and the beginning of each trajectory was fixed to the first 
recorded point after the spike that is above —60 mV (15.4 mV after translation). 
The threshold is accommodated manually just above the highest local maximum 
of each group of homogeneous consecutive trajectories within the same on-cycle. 
Parameter estimation is performed path by path both with the full MLE and 
with the naive approximation. The analyzed data are plotted in Fig. [5] and 
results are presented in Table [3] together with a global estimate from all 3-4 
consecutive trajectories as a unique sample according to the method in Section 
131 The sample is small and the asymptotic regime is still not reached, but ac- 
cording to the simulation results in Section 15.21 the quality of the estimate is 
substantially improved. For these data, the naive estimates do not differ much 
from the MLEs, probably because the trajectories are relatively long, on average 
342 points corresponding to 34.2 ms. The individual estimates done trajectory 
by trajectory show large statistical fluctuations, which should be reduced in the 
global estimates, as well as the small sample bias. 



7 Conclusion 

Drawing inference for processes which are killed the first time they cross a 
threshold revealed an intriguing task. In the neurobiological literature where 
the problem is often encountered, estimation has always been performed ig- 
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Figure 5: Four cycles of intracellular recordings of the membrane potential of a 
motoneuron during mechanical stimulation obtained from an isolated carapace- 
spinal cord preparation from adult turtles. The dashed lines are the thresholds 
used in the analysis. 



Table 5: Experimental data from a motoneuron fitted with a SR model. The 
starting point xq is different trajectory by trajectory, but always around 15.5 
mV. The naive estimate of a always coincides with the MLE. 













a 






b 


cycle 


traj 


MLE 


naive 


MLE 


naive 






global MLE 






1 


3.53 


3.57 


0.132 


0.134 


0.118 








1 


2 


3.14 


3.19 


0.112 


0.114 


0.096 


4.43 


0.164 0.108 


29.4 




3 


5.83 


5.96 


0.221 


0.227 


0.104 




4 


8.35 


8.57 


0.317 


0.327 


0.112 










1 


6.88 


6.91 


0.242 


0.243 


0.110 








2 


2 


4.35 


4.56 


0.155 


0.164 


0.117 


5.76 


0.203 0.108 


30.8 




3 


4.92 


4.97 


0.177 


0.178 


0.100 




4 


9.73 


9.80 


0.319 


0.322 


0.090 










1 


4.87 


4.90 


0.167 


0.168 


0.113 








3 


2 


4.03 


4.08 


0.149 


0.151 


0.103 


4.18 


0.148 0.105 


31.4 




3 


4.34 


4.08 


0.151 


0.153 


0.099 










1 


6.18 


6.28 


0.222 


0.226 


0.118 










2 


1.43 


1.50 


0.044 


0.047 


0.124 


3.14 


0.106 0.119 


30.9 


4 






3 


1.15 


1.12 


0.001 


0.000 


0.100 




4 


6.30 


6.44 


0.226 


0.232 


0.114 
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noring the presence of the threshold, assuming wrongly that the sample is 
dr awn from an uncons t rained process. This appro ach was already criticized 



Bibbona et all (|2008l l2010h : iGiraudo et~aT1 (|201lh . but no general method is 



available in the statistical literature for this problem. We compute the correct 
likelihood accounting for the presence of the threshold, and show that even if 
the model is correctly specified, estimates from a single trajectory are poor. 
Bootstrap bias correction may be applied to improve estimators. Standard 
asymptotic theory does not apply to a single trajectory. The problem is solved 
and good estimates can be achieved if we can infer from a large sample made of 
many trajectories. Asymptotic results are proved in this case, and we show nu- 
merical evidence that 30-100 trajectories are enough to get reliable estimates. 
Even with 3-10 trajectories inference is greatly improved. In neuroscience, 
sample sizes of 50-1,000 repetitions of inter-spike intervals are common. This 
emphasizes the importance of knowing how to implement the correct likelihood. 

Many interesting questions remain open. In neuroscience applications, even 
if many trajectories are recorded, it might be questionable to assume that pa- 
rameters are stable along repetitions. For the data set analyzed in Section [5] 
the problem is a ppare nt. It would be appropriate to incorporate random effects 



(Picchini et al 



els (Jahnetal 



2008) or to consider more complicated inhomogeneous mod- 



20111 ). Moreover, a more flexible spike generation mechanism 



would proba bly be more realis tic and different models have been designed to 
this aim, cf. (jJahn et all 120111) and references therein. The lesson learned here 
is that a correct specification of the likelihood function which incorporates the 
presence of the threshold improve the estimates. 

In a few special cases, some ad-hoc methods that were designed for sequential 
analysis were shown t o provide unbiased estimators for a s i ngle trajectory of cer- 
tain k illed processes (IGirshick et all Il946t iFerebed . Il983t iBibbona and Rubbal . 
1 2 llh . It would be useful to find unbiased estimators for non-trivial diffusions 
also, and to provide a detailed comparison with the likelihood approach, espe- 
cially for those experiments where only one trajectory is available. Moreover, 
for most continuous time models, the transition density is not available, and 
the estimating function approach can be a good solution. To this aim suitable 
estimating functions have to be designed, which might also allow to remove the 
high frequency assumption. The challenge is to find explicit expressions for 
conditional moments of suitable functionals from the constrained distribution. 
Finally, the numerical approximations to the crossing probabilities in the OU 
and the SR models are only valid for small sampling steps. This is not a prob- 
lem for ncurophysiological data, which arc typically high frequency, but in other 
applications it might be a major drawback. 
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Table 6: Comparison between theoretical mean first passage step and sample 
averages. 





CASE 1 


CASE 2 


CASE 3 


CASE 4 


E(N) 


33.83 


33.83 


100.50 


98.99 


sample average 


33.72 


34.28 


100.73 


101.61 



A Numerical details 

In this Section we provide some details about the numerical procedure used 
in the simulation studies, both to simulate the sample and to compute the 
estimated values. We worked in the R environment. Many of our routines have 
been designed modifying and adapting functions that were im plemented in the 



sde package which is thoroughly documented in llacusl (|2008|) . 



A.l Simulations 

The simulation of diffusion processes up to their first-passage time through a 
barrier b requires some care. If at a given time the process was at level x n < b, 
and we generate the next point and find £ n +i < b we cannot assure that the 
underlying continuous process did not cross the barrier between the two points. 
If we stop the simulation only when x n +i > b, we significantly overestimate the 
first-pa ssage time. To solve this problem two competitive method s were pro- 
posed (jGiraudo and Sacerdotd . Il999t iBaldi and Caramellinol l2002h . For each 



couple of simulated points x n and x n+ \ (if both are below b), the probability p 
of the process crossing the threshold between the two points is evaluated and 
a corresponding Bernoulli random variable is generated: if you get 1 a crossing 
occurred and x n is the last point of the path, while means no crossing and 
the simulation continues. The first method is slightly more accurate when the 
discretization step gets larger, the second much faster to compute. We choose 
the second. To avoid influence of the use of the same approximation both in the 
simulation scheme and in the estimation method, we simulated with a smaller 
discretization step w.r.t. the one considered for estimation. To assess the accu- 
racy of the simulation we compare the mean first-passage time estimated from 
the simulations with the one prescribed by the theory. In particular, for the 
WD we calculate analytically the probability that the passage occurs between 
two steps of the discretization and the quantity 

oo 

E(JV) = nA ■ p (( n - 1)A < T < nA), 

n=l 

which is a discretized version of the mean first-passage time. A comparison 
between E(N) and its sample values derived from the simulations shows good 
agreement as reported in Table [6j the second row is already reported in Table 
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A. 2 Different implementation of formula (ITU]) 



Another approximation of the distribution of the first-crossing time is the fol- 
lowing, 

rS 

G b e (A\x) = 1 - / f b e (X A =y\X = x) dy. (21) 

J —oo 

In most cases the numerical evaluation of this integral is much slower than 
using ([TO]) without providing better performance. Nevertheless, there might 
be occasions where this alternative turns out to be useful. In particular, the 
possibility of calculating one of the integrals in (flU)) or (|2"Tj) analytically would 
drastically speed up the algorithm. 

In particular, for the OU process we can approximate fg(y, A|x) in (|21[) by 
expression (TT2"j) for the Wiener process, which is the first order approximation 
in A of (TlSl) , and we can calculate the integral analytically. If this expression 
replaces (flUl) . the algorithm becomes significantly faster but less precise, espe- 
cially if the discretization step is not extremely small. Numerical evaluation of 
(jlOp in the parameter settings used here is in any case reasonably fast so we 
suggest its adoption. 



A. 3 Minimization algorithm 

To minimize the negative log-likelihood function we used the standard Nelder- 
Mead algorithm provided by the R function optim. There are restrictions on 
the admissible values for some parameters: In the SR model /i < <r 2 /2, and 
a and /3 have to be positive in the SR and in the OU model. The likelihood 
function is evaluated as NA (missing value) if the minimizer tries to calculate 
it for parameters out of this range, and the minimum is calculated just among 
the admissible values. The effect of this constraint can be seen in Figures Q] 
and |4] both in the densities and in the Q-Q plots relative to estimation from 
a single trajectory. As soon as the estimates get better the effect is lost. The 
box-constrained algorithm, which is denoted "L-BFGS-B" in R, turned out not 
to be feasible as it halts if the likelihood function returns NAs of infinite values. 
Especially for the SR model the transition density (even in absence of a barrier) 
might be problematic to evaluate when the parameters provided by the mini- 
mizer are not close to the true ones. In this case we need to be sure that the 
function returns NA when it is not evaluated with satisfactory precision. For the 
transition density of the SR model the R functions dchisq and pchisq turned 
out t o be the best choice among the different possibilities discussed in ( Iacud . 



20081 Section 3.1.3). Nealder-Mead minimizers require reasonable initial values. 



These values are provided by estimators that can be calculated explicitly. We 
used the standard choices suggested in the literature in absence of a barrier. 
The initial estimators for the WD model are 

Xn ~2 J2iLi(Xi — Xi_i — p, A) 2 

a = 



(iV-l)A' (N-1)A 
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The initial estimators for the OU model are 



(TV - 1) A 



The initial estimators for the SR model are 



A 



lo 



i=l 



JV 



-^(Xjv-Xo) 



AT^l-e-M) ' 



29 



